16 research outputs found
The ELM Neuron: an Efficient and Expressive Cortical Neuron Model Can Solve Long-Horizon Tasks
Traditional large-scale neuroscience models and machine learning utilize
simplified models of individual neurons, relying on collective activity and
properly adjusted connections to perform complex computations. However, each
biological cortical neuron is inherently a sophisticated computational device,
as corroborated in a recent study where it took a deep artificial neural
network with millions of parameters to replicate the input-output relationship
of a detailed biophysical model of a cortical pyramidal neuron. We question the
necessity for these many parameters and introduce the Expressive Leaky Memory
(ELM) neuron, a biologically inspired, computationally expressive, yet
efficient model of a cortical neuron. Remarkably, our ELM neuron requires only
8K trainable parameters to match the aforementioned input-output relationship
accurately. We find that an accurate model necessitates multiple memory-like
hidden states and intricate nonlinear synaptic integration. To assess the
computational ramifications of this design, we evaluate the ELM neuron on
various tasks with demanding temporal structures, including a sequential
version of the CIFAR-10 classification task, the challenging Pathfinder-X task,
and a new dataset based on the Spiking Heidelberg Digits dataset. Our ELM
neuron outperforms most transformer-based models on the Pathfinder-X task with
77% accuracy, demonstrates competitive performance on Sequential CIFAR-10, and
superior performance compared to classic LSTM models on the variant of the
Spiking Heidelberg Digits dataset. These findings indicate a potential for
biologically motivated, computationally efficient neuronal models to enhance
performance in challenging machine learning tasks.Comment: 23 pages, 10 figures, 9 tables, submitted to NeurIPS 202
Discrete Key-Value Bottleneck
Deep neural networks perform well on prediction and classification tasks in
the canonical setting where data streams are i.i.d., labeled data is abundant,
and class labels are balanced. Challenges emerge with distribution shifts,
including non-stationary or imbalanced data streams. One powerful approach that
has addressed this challenge involves self-supervised pretraining of large
encoders on volumes of unlabeled data, followed by task-specific tuning. Given
a new task, updating the weights of these encoders is challenging as a large
number of weights needs to be fine-tuned, and as a result, they forget
information about the previous tasks. In the present work, we propose a model
architecture to address this issue, building upon a discrete bottleneck
containing pairs of separate and learnable (key, value) codes. In this setup,
we follow the encode; process the representation via a discrete bottleneck; and
decode paradigm, where the input is fed to the pretrained encoder, the output
of the encoder is used to select the nearest keys, and the corresponding values
are fed to the decoder to solve the current task. The model can only fetch and
re-use a limited number of these (key, value) pairs during inference, enabling
localized and context-dependent model updates. We theoretically investigate the
ability of the proposed model to minimize the effect of the distribution shifts
and show that such a discrete bottleneck with (key, value) pairs reduces the
complexity of the hypothesis class. We empirically verified the proposed
methods' benefits under challenging distribution shift scenarios across various
benchmark datasets and show that the proposed model reduces the common
vulnerability to non-i.i.d. and non-stationary training distributions compared
to various other baselines
A General Purpose Neural Architecture for Geospatial Systems
Geospatial Information Systems are used by researchers and Humanitarian
Assistance and Disaster Response (HADR) practitioners to support a wide variety
of important applications. However, collaboration between these actors is
difficult due to the heterogeneous nature of geospatial data modalities (e.g.,
multi-spectral images of various resolutions, timeseries, weather data) and
diversity of tasks (e.g., regression of human activity indicators or detecting
forest fires). In this work, we present a roadmap towards the construction of a
general-purpose neural architecture (GPNA) with a geospatial inductive bias,
pre-trained on large amounts of unlabelled earth observation data in a
self-supervised manner. We envision how such a model may facilitate cooperation
between members of the community. We show preliminary results on the first step
of the roadmap, where we instantiate an architecture that can process a wide
variety of geospatial data modalities and demonstrate that it can achieve
competitive performance with domain-specific architectures on tasks relating to
the U.N.'s Sustainable Development Goals.Comment: Presented at AI + HADR Workshop at NeurIPS 202